Incremental Kernel Fuzzy c-Means
نویسندگان
چکیده
The size of everyday data sets is outpacing the capability of computational hardware to analyze these data sets. Social networking and mobile computing alone are producing data sets that are growing by terabytes every day. Because these data often cannot be loaded into a computer’s working memory, most literal algorithms (algorithms that require access to the full data set) cannot be used. One type of pattern recognition and data mining method that is used to analyze databases is clustering; thus, clustering algorithms that can be used on large data sets are important and useful. We focus on a specific type of clustering: kernelized fuzzy c-means (KFCM). The literal KFCM algorithm has a memory requirement of O(n), where n is the number objects in the data set. Thus, even data sets that have nearly 1,000,000 objects require terabytes of working memory—infeasible for most computers. One way to attack this problem is by using incremental algorithms; these algorithms sequentially process chunks or samples of the data, combining the results from each chunk. Here we propose three new incremental KFCM algorithms: rseKFCM, spKFCM, and oKFCM. We assess the performance of these algorithms by, first, comparing their clustering results to that of the literal KFCM and, second, by showing that these algorithms can produce reasonable partitions of large data sets. In summary, the rseKFCM is the most efficient of the three, exhibiting significant speedup at low sampling rates. The oKFCM algorithm seems to produce the most accurate approximation of KFCM, but at a cost of low efficiency. Our recommendation is to use rseKFCM at the highest sample rate allowable for your computational and problem needs.
منابع مشابه
Kernel-based fuzzy and possibilistic c-means clustering
The 'kernel method' has attracted great attention with the development of support vector machine (SVM) and has been studied in a general way. In this paper, this 'method' is extended to the well-known fuzzy c-means (FCM) and possibilistic c-means (PCM) algorithms. It is realized by substitution of a kernel-induced distance metric for the original Euclidean distance, and the corresponding algori...
متن کاملA Novel Kernel Based Fuzzy C Means Clustering With Cluster Validity Measures
-Clustering algorithms are an integral part of both computational intelligence and pattern recognition. It is unsupervised methods for classifying data into subgroups with similarity based inter cluster and intra cluster. In fuzzy clustering algorithms, mainly used algorithm is Fuzzy c-means (FCM) algorithm. This FCM algorithm is efficient only for spherical data when the input of the data stru...
متن کاملDifferent Objective Functions in Fuzzy c-Means Algorithms and Kernel-Based Clustering
An overview of fuzzy c-means clustering algorithms is given where we focus on different objective functions: they use regularized dissimilarity, entropy-based function, and function for possibilistic clustering. Classification functions for the objective functions and their properties are studied. Fuzzy c-means algorithms using kernel functions is also discussed with kernelized cluster validity...
متن کاملSpatial Bias Correction Based on Gaussian Kernel Fuzzy C Means in Clustering
Clustering is the process of grouping data objects into set of disjointed classes called clusters so that objects within a class are highly similar to one another and dissimilar to the objects in other classes. K-means (KM) and Fuzzy c-means (FCM) algorithms are popular and powerful methods for cluster analysis. However, the KM and FCM algorithms have considerable trouble in a noisy environment...
متن کاملDynamic Incremental Fuzzy C-Means Clustering
Researchers have observed that multistage clustering can accelerate convergence and improve clustering quality. Two-stage and two-phase fuzzy C-means (FCM) algorithms have been reported. In this paper, we demonstrate that the FCM clustering algorithm can be improved by the use of static and dynamic single-pass incremental FCM procedures. Keywords-Clustering; Fuzzy C-Means Clustering; Incrementa...
متن کامل